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Abstract 

We consider the problem of self-healing in reconfigurable networks (e.g. peer-to-peer and 
wireless mesh networks) that are under repeated attack by an omniscient adversary and propose 
a fully distributed algorithm, Xheal , that maintains good expansion and spectral properties of 
the network, also keeping the network connected. Moreover, Xheal , does this while allowing only 
low stretch and degree increase per node. The algorithm heals global properties like expansion 
and stretch while only doing local changes and using only local information. We use a model 
similar to that used in recent work on self-healing. In our model, over a sequence of rounds, an 
adversary cither inserts a node with arbitrary connections or deletes an arbitrary node from the 
network. The network responds by quick "repairs," which consist of adding or deleting edges in 
an efficient localized manner. 

These repairs preserve the edge expansion, spectral gap, and network stretch, after adver- 
sarial deletions, without increasing node degrees by too much, in the following sense. At any 
point in the algorithm, the expansion of the graph will be either 'better' than the expansion of 
the graph formed by considering only the adversarial insertions (not the adversarial deletions) 
or the expansion will be, at least, a constant. Also, the stretch i.e. the distance between any 
pair of nodes in the healed graph is no more than a O(logn) factor. Similarly, at any point, a 
node v whose degree would have been d in the graph with adversarial insertions only, will have 
degree at most 0(nd) in the actual graph, for a small parameter k. We also provide bounds on 
the second smallest eigenvalue of the Laplacian which captures key properties such as mixing 
time, conductance, congestion in routing etc. Our distributed data structure has low amortized 
latency and bandwidth requirements. Our work improves over the self-healing algorithms For- 
giving tree [PODC 2008] and Forgiving graph [PODC 2009] in that we are able to give guarantees 
on degree and stretch, while at the same time preserving the expansion and spectral properties 
of the network. 



1 Introduction 



Networks in the modern age have grown to such an extent that they have now begun to resemble self- 
governing living entities. Centralized control and management of resources has become increasingly 
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untenable. Distributed and localized attainment of self-* properties is fast becoming the need of 
the hour. 

As we have seen the baby Internet grow through its adolescence into a strapping teenager, 
we have experienced and are experiencing many of its growth pangs and tantrums. There have 
been recent disruption of services in networks such as Google, Twitter, Facebook and Skype. On 
August 15, 2007 the Skype network crashed for about 48 hours, disrupting service to approximately 
200 million users [Sj [211 [221 [281 ED] ■ Skype attributed this outage to failures in their "self-healing 
mechanisms" [2]. We believe that this outage is indicative of the unprecedented complexity of 
modern computer systems: we are approaching scales of billions of components. Unfortunately, 
current algorithms ensure robustness in computer networks through the increasingly unscalable 
approach of hardening individual components or, at best, adding lots of redundant components. 
Such designs are increasingly unviable. No living organism is designed such that no component of 
it ever fails: there are simply too many components. For example, skin can be cut and still heal. 
It is much more practical to design skin that can heal than a skin that is completely impervious to 
attack. 

This paper adopts a responsive approach, in the sense that it responds to an attack (or com- 
ponent failure) by changing the topology of the network. This approach works irrespective of the 
initial state of the network, and is thus orthogonal and complementary to traditional non-responsive 
techniques. This approach requires the network to be reconfigurable, in the sense that the topology 
of the network can be changed. Many important networks are reconfigurable. Many of these we 
have designed e.g. peer-to-peer, wireless mesh and ad-hoc computer networks, and infrastructure 
networks, such as an airline's transportation network. Many have existed since long but we have 
only now closely scrutinized them e.g. social networks such as friendship networks on social net- 
working sites, and biological networks, including the human brain. Most of them are also dynamic, 
due to the capacity of individual nodes to initiate new connections or drop existing connections. 

In this setting, our paper seeks to address the important and challenging problem of efficiently 
and responsively maintaining global invariants in a localized, distributed manner. It is obvious 
that it is a significant challenge to come up with approaches to optimize various properties at the 
same time, especially with only local knowledge. For example, a star topology achieves the lowest 
distance between nodes, but the central node has the highest degree. If we were trying to give the 
lowest degrees to the nodes in a connected graph, they would be connected in a line/cycle giving 
the maximum possible diameter. Tree structures give a good compromise between degree increase 
and distances, but may lead to poor spectral properties (expansion) and poor load balancing. Our 
main contribution is a self-healing algorithm Xheal that maintains spectral properties (expansion), 
connectivity, and stretch in a distributed manner using only localized information and actions, 
while allowing only a small degree increase per node. Our main algorithm is described in Section [3j 

Our Model: Our model, which is similar to the model introduced in |15} 13 lj . is briefly described 
here. We assume that the network is initially a connected (undirected, simple) graph over n nodes. 
An adversary repeatedly attacks the network. This adversary knows the network topology and our 
algorithm, and it has the ability to delete arbitrary nodes from the network or insert a new node 
in the system which it can connect to any subset of nodes currently in the system. However, we 
assume the adversary is constrained in that in any time step it can only delete or insert a single 
node. (Our algorithm can be extended to handle multiple insertions/deletions.) The detailed model 
is described in Section [2j 

Our Results: For a reconfigurable network (e.g., peer-to-peer, wireless mesh networks) that has 
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both insertions and deletions, let G' be the graph consisting of the original nodes and inserted nodes 
without any changes due to deletions. Let n be the number of nodes in G' , and G be the present 
(healed) graph. Our main result is a new algorithm Xheal that ensures (cf. Theorem [2] in Section 
HJ: 1) Spectral Properties: If G' has expansion equal or better than a constant, Xheal achieves at 
least a constant expansion, else it maintains at least the same expansion as G'; Furthermore, we 
show bounds on the second smallest eigenvalue of the Laplacian of G, A(G) with respect to the 
corresponding A(G'). An important special case of our result is that if G' is an (bounded degree) 
expander, then Xheal guarantees that G is also an (bounded degree) expander. We note that such 
a guarantee is not provided by the self-healing algorithms of |15t I14j . 2) Stretch: The distance 
between any two nodes of the actual network never increases by more than O(logn) times their 
distance in G"; and 3) the degree of any node never increases by more than k times its degree in G' , 
where k is a small parameter (which is implementation dependent, can be chosen to be a constant 
— cf . Section [5]) . 

Our algorithm is distributed, localized and resource efficient. We introduce the main algorithm 
separately (Section [3]) and a distributed implementation (Section [5]). The high-level idea behind 
our algorithm is to put a K-regular expander between the deleted node and its neighbors. Since 
this expander has low degree and constant expansion, intuitively this helps in maintaining good 
expansion. However, a key complication in this intuitive approach is efficient implementation while 
maintaining bounds on degree and stretch. The k parameter above is determined by the particular 
distributed implementation of an expander that we use. Our construction is randomized which 
guarantees efficient maintenance of an expander under insertion and deletion, albeit at the cost of a 
small probability that the graph may not be an expander. This aspect of our implementation can be 
improved if one can design efficient distributed constructions that yield expanders deterministically. 
(To the best of our knowledge no such construction is known). In our implementation, for a deletion, 
repair takes O(logn) rounds and has amortized complexity that is within O(Klogn) times the best 
possible. The formal statement and proof of these results are in Sections 0] and El 
Related Work: The work most closely related to ours is \15\ I31|. which introduces a distributed 
data structure Forgiving Graph that, in a model similar to ours, maintains low stretch of a net- 
work with constant multiplicative degree increase per node. However, Xheal is more ambitious 
in that it not only maintains similar properties but also the spectral properties (expansion) with 
obvious benefits, and also uses different techniques. However, we pay with larger message sizes 
and amortized analysis of costs. The works of [15} l3~T] themselves use models or techniques from 
earlier work [3 1 1. fT^| [29], Hj. They put in tree like structures of nodes in place of the deleted node. 
Methods which put in tree like sructures of nodes are likely to be bad for expansion. If the original 
network is a star of n + 1 nodes and the central node gets deleted, the repair algorithm puts in a 
tree, pulling the expansion down from a constant to 0(l/n). Even the algorithms Forgiving tree 
|14j and Forgiving graph [15] , which put in a tree of virtual nodes (simulated by real nodes) in place 
of a deleted node don't improve the situation. In these algorithms, even though the real network 
is an isomorphism of the virtual network, the 'binary search' properties of the virtual trees ensure 
a poor cut involving the root of the trees. 

The importance of spectral properties is well known |5j[l8]. Many results are based on graphs 
having enough expansion or conductance, including recent results in distributed computing in 
information spreading etc. [16]. There are only a few papers showing distributed construction of 
expander graphs [20j[6j[II]; Law and Siu's construction gives expanders with high probability using 
Hamilton cycles which we use in our implementation. 
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Many papers have discussed strategies for adding additional capacity or rerouting in anticipation 
of failures (3] [TJ HO] HHJ [26] G2J [33] . Some other results are also responsive in some sense: [22] [TJ or 
have enough built-in redundancy in separate components [12] . but all of them have fixed network 
topologies. Our approach does not dictate routing paths or require initially placed redundant 
components. There is also some research in the physics community on preventing cascading failures 
which empirically works well but unfortunately performs very poorly under adversarial attack |17[ 



1.1 Preliminaries 

Edge Expansion: Let G = (V, E) be an undirected graph and S C V be a set of nodes. We 
denote S = V — S. Let |E| 5 ^ = {(u,v) G E\u E S, v G S} be the number of edges crossing 
the cut (S,S). We define the volume of S to be the sum of the degrees of the vertices in 5 as 

|E| - 

vol(S) = X^xgS degree(x). The edge expansion of the graph Kq is defined as, he = min|g|<|v|/2 |g[ S - 
Cheeger constant: A related notion is the Cheeger constant 4>g of a graph (also called as con- 

| E | 

ductance) defined as follows [51: (bn = mini 91 — — — i/ ^; s 

; 1-1 VL * l°l mm(vol(S),vol(S)) 

The Cheeger constant can be more appropriate for graphs which are very non-regular, since the 
denominator takes into account the sum of the degrees of vertices in S, rather than just the size of 
S. Note for k— regular graphs, the Cheeger constant is just the edge expansion divided by k, hence 
they are essentially equivalent for regular graphs. However, in general graphs, key properties such as 
mixing time, congestion in routing etcare captured more accurately by the Cheeger constant, rather 
than edge expansion. For example, consider a constant degree expander of n nodes and partition 
the vertex set into two equal parts. Make each of the parts a clique. This graph has expansion at 
least a constant, but its conductance is 0(l/n). Thus while the expander has logarithmic mixing 
time, the modified graph has polynomial mixing time. 

The Cheeger constant is closely related to the the second-smallest eigenvalue of the Laplacian 
matrix denoted by Ac (also called the "algebraic connectivity" of the graph). Hence Ac, like the 
Cheeger constant, captures many key "global" properties of the graph [5j. Ac captures how "well- 
connected" the graph is and is strictly greater than (which is always the smallest eigenvalue) if 
and only if the graph is connected. For an expander graph, it is a constant (bounded away from 
zero). The larger Ac is, larger is the expansion. 

Theorem 1. Cheeger inequality^ 2<pa > Ac > 4>gI^ 



2 Node Insert, Delete, and Network Repair Model 

This model is based on the one introduced in [151 131] . Somewhat similar models were also 
used in [T3] [29]. We now describe the details. Let G = Gq be an arbitrary graph on n nodes, 
which represent processors in a distributed network. In each step, the adversary either adds a node 
or deletes a node. After each deletion, the algorithm gets to add some new edges to the graph, 
as well as deleting old ones. At each insertion, the processors follow a protocol to update their 
information. The algorithm's goal is to maintain connectivity in the network, while maintaining 
good expansion properties and keeping the distance between the nodes small. At the same time, 
the algorithm wants to minimize the resources spent on this task, including keeping node degree 
small. We assume that although the adversary has full knowledge of the topology at every step and 



4 



Figure 1: The Node Insert, Delete and Network Repair Model - Distributed View. 



Each node of Go is a processor. 

Each processor starts with a list of its neighbors in Go- 

Pre-processing: Processors may send messages to and from their neighbors. 

for t := 1 to T do 

Adversary deletes or inserts a node i>t from/into Gt—i, forming Ut- 
if node vt is inserted then 

The new neighbors of Vt may update their information and send messages to and 
from their neighbors, 
if node vt is deleted then 

All neighbors of vt are informed of the deletion. 
Recovery phase: 

Nodes of Ut may communicate (synchronously, in parallel) with their immediate 
neighbors. These messages are never lost or corrupted, and may contain the 
names of other vertices. 

During this phase, each node may insert edges joining it to any other nodes as 
desired. Nodes may also drop edges from previous rounds if no longer required. 
At the end of this phase, we call the graph Gt- 



Success metrics: Minimize the following "complexity" measures: 

Consider the graph G' t which is the graph, at timestep t, consisting solely of the 

original nodes (from Go) and insertions without regard to deletions and healings. 

1. Degree increase. max„ eGt dc g rcc ^ G ,j 

2. Edge expansion. h{Gt) > min(a, f3h{G' t ))\ for constants a,(3 > 

3. Network stretch, max^^gg, ^st(x'^G*) ' wnere > f° r a graph G and nodes x and 
y in G, dist(x, y, G) is the length of the shortest path between x and y in G. 

4. Recovery time. The maximum total time for a recovery round, assuming it 
takes a message no more than 1 time unit to traverse any edge and we have 
unlimited local computational power at each node. We assume the LOCAL 
message-passing model, i.e., there is no bound on the size of the message that 
can pass through an edge in a time step. 

5. Communication complexity. Amortized number of messages used for recov- 
ery. 



can add or delete any node it wants, it is oblivious to the random choices made by the self-healing 
algorithm as well as to the communication that takes place between the nodes (in other words, we 
assume private channels between nodes). 

Initially, each processor only knows its neighbors in Go , and is unaware of the structure of the 
rest of Go- After each deletion or insertion, only the neighbors of the deleted or inserted vertex 
are informed that the deletion or insertion has occurred. After this, processors are allowed to 
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communicate (synchronously) by sending a limited number of messages to their direct neighbors. 
We assume that these messages are always sent and received successfully. The processors may also 
request new edges be added to the graph. We assume that no other vertex is deleted or inserted 
until the end of this round of computation and communication has concluded. 

We also allow a certain amount of pre-processing to be done before the first attack occurs. 
In particular, we assume that all nodes have access to some amount of local information. For 
example, we assume that all nodes know the address of all the neighbors of its neighbors (NoN). 
More generally, we assume the (synchronous) LOCAL computation model [27] for our analysis. 
This is a well studied distributed computing model and has been used to study numerous "local" 
problems such as coloring, dominating set, vertex cover etc. [27] . This model allows arbitrary 
sized messages to go through an edge per time step. In this model the NoN information can be 
exchanged in O(l) rounds. 

Our goal is to minimize the time (the number of rounds) and the (amortized) message complexity 
per deletion (insertion doesn't require any work from the self-healing algorithm). Our model is 
summarized in Figured! 

3 The algorithm 

We give a high-level view of the distributed algorithm deferring the distributed implementation 
details for now (these will be described later in Section [5]). The algorithm is summarized in Al- 
gorithm [3l To describe the algorithm, we associate a color with each edge of the graph. We will 
assume that the original edges of G and those added by the adversary are all colored black ini- 
tially. The algorithm can later recolor edges (i.e., to a color other than black — throughout when 
we say "colored" edge we mean a color other than black) as described below. If (u, v) is a black 
(colored) edge, we say that v(u) is a black (colored) neighbor of u(v). Let k be a fixed parameter 
that is implementation dependent (cf. Section [5]). For the purposes of this algorithm, we assume 
the existence of a K-regular expander with edge expansion a > 2. 

At any time step, the adversary can add a node (with its incident edges) or delete a node (with 
its incident edges). Addition is straightforward, the algorithm takes no action. The added edges 
are colored black. 

The self-healing algorithm is mainly concerned with what edges to add and/or delete when a 
node is deleted. The algorithm adds/deletes edges based on the colors of the edges deleted as well 
as on other factors as described below. Let v be the deleted node and NBR(v) be the neighbors 
of v in the network after the current deletion. We have the following cases: 

Case 1: All the deleted edges are black edges. In this case, we construct a K-regular expander 
among the neighbor nodes NBR(v) of the deleted node. (If the number of neighbors is less than k, 
then a clique (a complete graph) is constructed among these nodes.) All the edges of this expander 
are colored by a unique color, say C v (e.g., the ID of the deleted node can be chosen as the color, 
assuming that every node gets a unique ID whenever it is inserted to the network). Note that the 
addition of the expander edges is such that multi-edges are not created. In other words, if (black) 
edge (u, v) is already present, and the expander construction mandates the addition of a (colored) 
edge between (u,v) then this done by simply re-coloring the edge to color C v . Thus our algorithm 
does not add multi-edges. 

We call the expander subgraph constructed in this case among the nodes in NBR(v) as a 
primary ( expander) cloud or simply a primary cloud and all the (colored) edges in the cloud are 
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called primary edges. (The term "cloud" is used to capture the fact that the nodes involved are 
"closeby", i.e., local to each other.) To identify the primary cloud (as opposed to a secondary, 
described later) we assume that all primary colors are different shades of color red. 
Case 2: At least some of the deleted edges are colored edges. In this case, we have two subcases. 
Case 2.1: All the deleted colored edges are primary edges. Let the colored edges belong to the 
colors C\ , C% , . . . , Cj . This means that the deleted node v belonged to j primary clouds (see Figure 
[3]). There will be n edges of each color class deleted, since v would have degree k in each of the 
primary expander clouds. In case v has black neighbors, then some black edges will also be deleted. 
Assume for sake of simplicity that there are no black neighbors for now. If they are present, they 
can be handled in the same manner as described later. 

In this subcase, we do two operations. First, we fix each of the j primary clouds. Each of 
these clouds lost a node and so the cloud is no longer a K-regular expander. We reconstruct a 
new fi-regular expander in each of the primary clouds (among the remaining nodes of each cloud) . 
(This reconstruction is done in an incremental fashion for efficiency reasons — cf. Section [5j) The 
color of the edges of the respective primary clouds are retained. Second, we pick one free node, 
if available (free nodes are explained below), from each primary cloud (i.e., there will be j such 
nodes picked, one from each primary cloud) and these nodes will be connected together via a (new) 
K-regular expander. (Again if the number of primary clouds involved are less than or equal k + 1 
i.e., j < k + 1, then a clique will be constructed.) The edges of this expander will have a new 
(unique) color of its own. We call the expander subgraph constructed in this case among the j 
nodes as a secondary (expander) cloud or simply a secondary cloud and all the (colored) edges in 
the cloud are called secondary edges. To identify a secondary cloud, we assume that all secondary 
colors are different shades of color orange. 

If the deleted node v has black neighbors, then they are treated similarly, consider each of the 
neighbors as a singleton primary cloud and then proceed as above. 




Figure 2: A node can be part of many primary clouds. 

Free nodes and their choosing: The nodes of the primary clouds picked to form the secondary 
cloud are called non-free nodes. Thus free nodes are nodes that belong to only primary clouds. 
We note that a free node can belong to more than one primary cloud (see e.g., Figure [3]). In the 
above construction of the secondary cloud, we choose one unique free node from each cloud, i.e., 
if there are j clouds then we choose j different nodes and associate each with one unique primary 
cloud (if a free node belongs to two or more primary clouds, we associate it with only one of 
them) such that each primary cloud has exactly one free node associated with it. (How this is 
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implemented is deferred to Section [5j) We call the free node associated with a particular primary 
cloud as the bridge node that "connects" the primary cloud with the secondary cloud. Note that 
our construction implies that any (bridge) node of a primary cloud can belong to at most one 
secondary cloud. 

What if there are no free nodes associated with a primary cloud, say C? Then we pick a free 
node (say w) from another cloud among the j primary clouds (say C) and share the node with 
the cloud C. Sharing means adding w to C and forming a new K-regular expander among the 
remaining nodes of C (including w). Thus w will be part of both C and C clouds, w will be used 
as a free node associated with C for the subsequent repair. Note that this might render C devoid 
of free nodes. To compensate for this, C' gets a free node (if available) from some other cloud 
(among the j primary clouds). Thus, in effect, every cloud will have its own free node associated 
with it, if there are at least j free nodes (totally) among the j clouds. 

There is only one more possibility left to the discussed. If there are less than j free nodes 
among all the j clouds, then we combine all the j primary clouds into a single primary cloud, i.e., 
we construct a K-regular expander among all the nodes of the j primary cloud (the previous edges 
belonging to the clouds are deleted). The edges of the new cloud will have a new (unique) color 
associated with it. Also all non-free nodes associated with the previous j clouds become free again 
in the combined cloud. We note that combining many primary clouds into one primary cloud is 
a costly operation (involves a lot of restructuring). We amortize this costly operation over many 
cheaper operations. This is the main intuition behind constructing a secondary expander and free 
nodes; constructing a secondary expander is cheaper than combining many primary expanders and 
this is not possible only if there are no free nodes (which happens only once in a while). 
Case 2.2: Some of the deleted edges are secondary edges. In other words, the deleted node, say v, 
will be a bridge (non-free) node. Let the deleted edges belong to the primary clouds C l5 C2, . . . , Cj 
and the secondary cloud F. (Our algorithm guarantees that a bridge node can belong to at most 
one secondary cloud.) We handle this deletion as follows. Let v be the bridge node associated 
with the primary cloud Cj (one among the j clouds). Without loss of generality, let the secondary 
cloud connect a strict subset, i.e., j' < j primary clouds with possibly other (unaffected) primary 
clouds. This case is shown in Figure [3J As done in Case 2.1, we first fix all the j primary clouds 
by constructing a new ^-regular expander among the remaining nodes. We then fix the secondary 
cloud by finding another free node, say z, from C{, and reconstructing a new ^-regular secondary 
cloud expander on z and other bridge nodes of other primary clouds of F. The edges retain the 
same color as their original. If there are no free nodes among all the primary clouds of F, then 
all primary clouds of F are combined into one new primary cloud as explained in Case 2.1 above 
(edges of F are deleted). The remaining j — j' primary clouds are then repaired as in case 2.1 by 
constructing a secondary cloud between them. 

4 Analysis of Xheal 

The following is our main theorem on the guarantees that Xheal provides on the topological proper- 
ties of the healed graph. The theorem assumes that Xheal is able to construct a K-regular expander 
(deterministically) of expansion a > 2. 

Theorem 2. For graph Gt(present graph) and graph G' t (of only original and inserted edges), at 
any time t, where a timestep is an insertion or deletion followed by healing: 
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C2 Cj> 



Figure 3: Case 2.2: Deleted node x part of secondary cloud F, and primary clouds 


1 
2 
3 
4 
5 
6 
7 
8 
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10 
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12 
13 
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if node v inserted with incident edges then 

The inserted edges are colored black, 
if node v is deleted then 

if all deleted edges are black then 

MAKBChOVD(BlackNbrs(v), primary, Clr new ) 
else if deleted colored edges are all primary then 
Let C\, . . . ,Cj be primary clouds that lost an edge 
FixPrimaryQCi, ... ,Cj\) 

MakeSecondaryQCi, . . . , Cj] U BlackNbrs(v)) 
else 

Let \C\, . . . ,Cj] <(— primary clouds of v; F <(— secondary cloud of v; [U] ^— Clouds(F) \ 

[d,...,^], [c l ,...,c r ]^Fn{c l ,...,c j ] 
FixPrimaryQCi, . . . , Cj}) 
FixSecondary(F, v) 

MakeSecondary([Cj/ + i, ... ,Cj]U BlackNbrs(v)) 


Algorithm 3.1: Xheal(G,k) 


1 

2 
3 
4 


if |V| < k + 1 then 

Make clique among [V] 
else 

Make K-reg expander among [V] of edge (Type, Clr) 


Algorithm 3.2: MakeCloudQF], Type, Clr) 


1 

2 


for each cloud Ci G [C] do 

MAKECLOUD(Ci, primary, Color (Ci)) 



Algorithm 3.3: FixPrimary([C]) 



1. For all x € Gt, degreec t (x) < K.degreeQi (y) , for a fixed constant k > 0. 

2. For any two nodes u,v G Gt, 5c t (u,v) < 6q> (u,v)0 (log n), where 5(u,v) is the shortest path 
between u and v, and n is the number of nodes in Gt- 

3. h(Gt) > min(a,h(G' t )), for some fixed constant a > 1. 
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1: for each cloud Cj € [C] do 

2: if FrNodei = PiCKFREENODE(Gj) == NULL then 
3: MAKECLOUD(iVodes([C]) , primary, Clr new ) 

4: Return 

5: MakeCloud(|J FrNodei VCj G [C], secondary, Clr new ) 

Algorithm 3.4: MakeSecondaryQG]) 

1: if v is a bridge node of G, in i* 1 then 

2: if FrNodei = PickFreeNode(Cj) == NULL then 

3: MakeCloud (Nodes(F) , primary , Clr new ) 

4: else 

5: MakeCloud (-FriVoffej U BridgeN ode(C j) VCj G [C], secondary, Color(F)) 

Algorithm 3.5: FixSecondaryCloud(F, v) 

1: Let a Free node be a primary node without secondary duties 
2: if Free node in my cloud then 
3: Return Free node 
4: else 

5: Ask neighbor clouds; if a free node found, return node, else return NULL 

Algorithm 3.6: PickFreeNode() 

I X(G t ) > min (n ( (1^27^ (Kd ma l(G' t ))^ ) ) ' where d ^in(G' t ) and d max (G' t ) are the 

minimum and maximum degrees of G' t . 

From the above theorem, we get an important corollary: 

Corollary 1. If G' t is a (bounded degree) expander, then so is Gt- In other words, if the original 
graph and the inserted edges is an expander, then Xheal guarantees that the healed graph also is an 
expander. 

4.1 Expansion, Degree and Stretch 

Lemma 1. Suppose at the first timestep (t=l), a deletion occurs. Then, after healing, h(G\) > 
min(c, h(G'i)) , for a constant c > 1. 

Proof. Observe that the initial graphs Go and G' Q are identical. Suppose that node x is deleted at 
t = 1. For ease of notation, refer to the graph Go as G and the healed graph G\ as H. Notice 
that G^ is the same as Go, since the graph G' t does not change if the action at time t is a deletion. 
Consider the induced subgraph formed by x and its neighbors. Since all the deleted edges are black 
edges, Case 1 of the algorithm applies. Thus the healing algorithm will replace this subgraph by a 
new subgraph, a K-regular expander over x's ex-neighbors. Let us call this new subgraph /. Note 
that this corresponds to Case 1 of the Algorithm. We refer to Figure liTTl 

Consider a set S(H) which defines the expansion in H i.e. \S(H)\ < n/2 (where n is the 
number of nodes in G), and S(H) has the minimum expansion over all the subsets of H. Call 
the cut induced by S(H) as E s §(H) and its size as \E\ S g(H). Also refer to the same set in G 
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Go 



Figure 4: Healed graph after deletion of node x. The ball of x and its neighbors gets replaced by 
a ^-regular expander of its neighbors — Case 1 of the Algorithm. 



(without x if S(H) included x) as S(G), and the cut as E s §(G). The key idea of the proof is 
to directly bound the expansion of H, instead of looking at the change of expansion from of G. 
In particular, we have to handle the possibility that our self-healing algorithm may not add any 
new edges, because those edges may already be present. (Intuitively, this means that the prior 
expansion itself is good.) 

We consider two cases depending on whether the healing may or may not have affected this cut. 

1. E Si§ (H)nE(I) = H>: 

This implies that only the edges which were in G are involved in the cut E SS (H). Since 
expansion is defined as the minimum over all cuts, |E| S g(G) > h(G)\S(G)\. Also, since 
E SjS (H) = E S>§ (G) and S(H) < S(G), we have: 

k(H) _ > ^ > h(G). 



2. E s g(H) n E(I) ^ 0: Notice that if there is any minimum expansion cut not intersecting 
E(I), part [H applies, and we are done. 

The healing algorithm tries to add enough new edges (if needed) into / so that I itself has 
an expansion of a > 2 (cf. Algorithm in Section [3]). Note that it may not succeed if |/| is too 
small. However, in that case, the algorithm makes I a clique and achieves an expansion of c 
where c > 1. Thus, we have the following cases: 

(a) I has an expansion of a > 2: 

Consider the nodes in / which are part of S(H) i.e., B = S(H)PiI. We want to calculate 
h(H). Since expansion is defined over sets of size not more than half of the size of the 
graph, we can do so in two ways: 

i. B < 1/2: S(H) expands at least as much as h(G) except for the edges lost to x, and 
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our algorithm ensures that / has expansion of at least a > 2. Therefore, we have: 



h(H) 



> 



S(H) 

(\S(H)\ - \B\).h(G) - \B\ + \B\.a 



\S(H)\ 

{\S(H)\ - \B\).h(G) + \B\.(a - 1) 
\S(H)\ 



In the numerator above, we have (\S(H)\ — \B\).h(G) which is a lower bound for 
the number of edges emanating from the set S(H) (we minus \B\ from jS^ff)! to 
account for the edges that may be already present, note that Xheal does not add 
edges between two nodes if they are already present.) We subtract another |£?| or 
the edges lost to the deleted node and add \B\a edges due to the expansion gained. 



Otherwise, if h{G) < a - 1, we get: h(H) > |s( | f ( ) ^} G) > h(G) 
ii. B < 1/2: By construction, nodes of B expand with expansion at least a in the 
subgraph /. Similar to above, we get, h(H) > (m H )\-\ B \)^h(G)+\B\.(a-i) . Thus, if 
h(G) > a - 1, then h(H) > a - 1, else h(H) > h{G). 

(b) / has an expansion of c < a: 

This happens in the case of the degree of x being smaller than k. In this case, the 
expander / is just a clique. Note that, even if degree of x is 2, the expansion is 1. 
(When the degree of x is 1, then the deleted node is just dropped, and it is easy to show 
that in this case, h(H) > h(G).) The same analysis as the above applies, and we get 
h(H) > min(c' , h(G)) , for some constant d > 1. Since G is G\ and H is G' l5 we get 



Corollary 2. Given a graph G, and a subgraph B of G, construct a new graph H as follows: 
Delete the edges of B and insert an expander of expansion a > 2 among the nodes of B. Then 
h{H) > min(c, h{G)), where c is a constant. 

Lemma 2. At end of any timestep t, h(Gt) > min(c', h(G' t )), where d > 1 is a fixed constant. 

Proof. First, consider the case when node v is inserted at time t. Observe that the topologies 
of both the graphs Gt and G' t would be the same if all the insertions were to happen before the 
deletions. This is because an incoming node comes in with only black edges and at no step does 
the healing algorithm rely on the number of nodes present or uses edges for possible future nodes. 
Therefore, for our analysis, consider an order in which all the insertions happened before the first 
deletion, in particular think of node v as being inserted at time s, and the first deletion happening 
at time s + 1. Since the graphs Gi and G^ would look exactly the same for all i before s + 1, 
insertion of node v changes both the graphs Gt and G' t in exactly the same way. Thus, if we can 
show that our lemma holds when a deletion happens (as we show below), we are done. 

Next, we consider that a deletion occurs at timestep t. The proof will be by induction on t. 
Lemma Q] already shows the base case, where it is assumed wlog that the first deletion occurs at 



The following cases arise: If h(G) > a — 1, we have h(H) > 



\S(H)\(a-l) 

mm 



> a- 1 > 1. 



MGi) > minid^G^)). 



□ 
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G 



G' 



Figure 5: Graphs Gt and G' t after insertion of node v. Graph Gt has some colored clouds. The 
nodes which have already been deleted are present in graph G' t and are shown as unfilled nodes. 



Figure 6: Healed graph after deletion of node x. The 'black' neighbors of x and some neighbors of 
x from color clouds C\, C2, ■ ■ ■ Cj get connected by a /t-regular expander of color C y . 

time t = 1. Notice that before the first deletion, graphs G and G' are identical and the proof is 
trivial. 

As per the algorithm, we have two main cases to consider. 

Case 1: This case occurs when the deleted edges are all black edges. This case is handled 
exactly as in the proof of Lemma [TJ 

Case 2.1 and Case 2.2: We analyze Case 2.1 below, the analysis of Case 2.2 is similar. 

First, we give the proof assuming that each cloud has a free node associated with it. 

Refer to figure 14.11 Let G be the original graph and H the healed graph. Let x be the node 
deleted. The graph G corresponds to the graph Gt-\- The graph G' t _i is the same as the graph G' t 
since the graph G' does not change on deletion. By the induction hypothesis, h(G) = h(Gt-i) > 
min(c, h(G' t _i) = h(G' t )). The graph H corresponds to the healed graph Gt- Thus, if we show 
h(H) > h(G), we are done. 

In this case, let the deleted node x belong to j primary clouds C\ to Cj. (We note that if x has 
black neighbors, the algorithm treats them as singleton primary clouds.) First the primary clouds 
are restructured by constructing a new fc-regular expander among the remaining nodes of the cloud 
(excluding the deleted node). Then, a free node from each color cloud is picked and are connected 
to form a K-regular expander of color, say, C x — this is the secondary cloud. 

The proof is a generalization of the argument of Lemma [TJ Let Eg ((H) be a cut that defines 
the expansion in the graph H, and Sh as defined before . Let us call this a minimum cut. If any 





G, 



t 
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minimum cut Eg{H) passes through only the edges of E(G) — (E(C±) U • • • U E(Cj) U E(C X ) (i.e., 
outside these clouds) then the expansion of H cannot decrease and we are done. Thus, we will 
consider the cases when all minimum cuts pass through some edges of the above clouds. 

Each of the colored balls maintains an expansion of at least a > 2. Let B\, B2, ■ ■ ■ Bj, B x , be 
the nodes of Sh in the balls of color Ci, C2, ■ ■ ■ Cj, C x respectively. (We abuse notation so that each 
Ci also denotes the subgraph defining the respective primary cloud.) In the following, for 1 < i < j, 
we define A{ = Bi if \B{\ < \d\/2, otherwise, we define Ai = Bi = Ci — B{ if \Bj\ > |Cj|/2. A x is 
similarly defined. 

We have: 

h(m > (\S(H)\ - \Aj\)h(G) - E- = i 1^1 + E- = i \Ma 

k{H) ~ \S{H)\ 

(\S(H)\ - \M)KG) + (E-=i \Aj\)(a - 1) 
\S(H)\ 

If h(G) > a — 1, we have: 

h{H) >a-l>l 

If h(G) < a — 1, we have: 



h(H) > h{G) 

Thus, h(Gt) = min(c' , h(G' t )), for some c' = min(c, a — 1) and the induction hypothesis holds. 

The above analysis assumes that each primary cloud had a free node for itself. Otherwise, as 
per the algorithm, free nodes from other clouds are shared. If there there a total of j free nodes 
among all the j clouds, then also the analysis proceeds as above. The only difference is that when 
a free node is shared between two clouds, its degree increases (by k). This can only increase the 
expansion, and hence the above analysis goes through. The other possibility is that there are less 
than j free nodes. In this case, all the primary clouds are combined into one single expander cloud. 
Here also, the analysis is similar to above. □ 

Lemma 3. For all x 6 Gt, degreeG t (x) < 0{n.degreeQi t (x)), for a fixed parameter k > 0. 

Proof. We bound the increase in degree of any node x that belongs to both Gt and G' t . Let the 
degree of x in G' t be d'(x) = degreeQ^x). This will be black-degree of x (as G' t comprises solely of 
edges present in the original graph plus the inserted edges). There are three cases to consider and 
we bound the degree increase in each: 

1. Whenever, a black edge gets deleted from this node, the self-healing algorithm, adds k 
colored edges in place of it, because a K-regular expander is constructed which includes this node 
(this expander can be a primary or a secondary cloud). Thus x's degree can increase by a factor of 
n at most because of deletion of black edges. 

2. When x loses a colored edge, then the algorithm restructures the expander cloud by con- 
structing a new K-regular expander. Again, this is true if the reconstruction is done on a primary 
or a secondary cloud. In this case, the degree of x does not change. 
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3. Finally, we consider the effect of non-free nodes, x's degree can increase if it is chosen as 
a bridge (non-free) node to connect a primary cloud (with which it is associated) to a secondary 
cloud. In this case, its degree will increase by k, since it will become part of the secondary cloud 
expander. There is one more possibility that can contribute to increase of x's degree by k more. If 
x is chosen to be shared as a free node, i.e., it gets associated as a free node with another primary 
cloud than it originally belongs to, then its degree increases by n more, since it becomes part of 
another K-regular expander. The shared node becomes a bridge node, i.e., a non-free node in that 
time step. Hence it cannot be shared henceforth. 

From the above, we can bound the degree of x in Gt, d{x) = degreec t {x), as follows: d(x) < 
nd'(x) + 2k. The lemma follows. □ 

Lemma 4. For any two nodes u,v € Gt, 

5G t (u,v) < 5Q' t (u,v).0(logn), where 5(u,v) is the shortest path between u andv, and n is the total 
number of nodes in Gt ■ 

Proof. We fix two nodes u and v and let the shortest distance between them in G' t be I. Since 
this is on the graph G' t (which comprises the original edges plus inserted edges), all the edges on 
this path will be black edges. Let this shortest path be denoted by P =< u, u±, . . . , U£-i,v >. We 
assume that I > 1, because the path will just be the edge (u,v) if i = 1 in which case there is 
nothing to prove (the edge will also be present in Gt). 

If all the intermediate nodes are present, then the result follows trivially. Otherwise, let 
u[, u' 2 , . . . , u[ (i < €) be the i deleted nodes listed in the order of their deletion (i.e., u[ was 
deleted before u' 2 and so on). 

We show that each node deletion can increase the distance between u and v by O(logn). 
Consider the deletion of node u^. This will create a /c-regular expander (primary or secondary, the 
latter case will arise if some incident edges of are colored) among the neighbors of u\ in path P. 
Thus the distance between these neighbors of will increase by 0(log((iegf(n / 1 )) = O(logn). We 
distinguish two cases for subsequent deletions: 

1. When the deleted node, say u'j, results in a primary cloud: In this case, the distance between 
the neighbors of u'j will increase by at most O(logn), as above. Note that any subsequent deletion 
of nodes belonging to the primary cloud will still keep the same stretch, as there will always be 
connected via a fc-regular expander. 

2. When the deleted node, say u'j, results in a secondary cloud: In this case, there are two 
possibilities: (a) If the secondary cloud does not comprise primary clouds formed from previous 
deletions of nodes in the path P. In this case, the increase in distance is O(logn) as above; (b) If 
the secondary cloud comprises primary clouds formed from prior deletions of nodes in P, then the 
distance between u and v increases also by O(logn), as one has to traverse through the secondary 
cloud (connecting the primary clouds). 

Thus, the overall distance between u and v increases by a factor of O(logn) in Gt compared to 
the distance in G' t . 

□ 

4.2 Spectral Analysis 

We derive bounds on the second smallest eigenvalue A which is closely related to properties such 
as mixing time, conductance etc. While it is directly difficult to derive bounds on A, we use our 
bounds on edge expansion and the Cheeger's inequality to do so. 
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We need the following simple inequality which relates the Cheeger constant <f>(G) and the edge 
expansion h(G) of a graph G which follows from their respective definitions. We use d max (G) and 
dmin(G) to denote the maximum and minimum node degrees in G. 



h ^ <HG)< h(G) 



dmax{G^ d m i n (G) 

Proof. By Cheeger's inequality and by inequality [T] we have, 



(i) 



2 \d max (Gt) 



By Lemma[2j we have, h(Gt) > min(c' , h(G' t )), for some d > 1. 
So we have two cases: 

Case 1: h(Gt) > h(G' t ). By using the other half of Cheeger's inequality, and inequality HJ and 
Lemma [3] we have: 



Case 2: h(G t ) > 1: 
This directly gives: 



X(G t 



> 



> 



> 



KG' t ) 



dmax (Gt 

X(G' t )d m i n (G' t ) 



2d max (Gf) 
KG't) 2 d rn i n (G' t ) 



8(k) 2 {d max {G' t )f 

dmin(Gi) 



n ( KG't? 



(k) 2 (<Wg*)) 2 



KGt) > 2 

> n 

> n 



i 



2 \d max {Gt) 
1 



(dmaAGt)) 2 
1 



□ 



5 Distributed Implementation of Xheal: Time and Message Com- 
plexity Analysis 

We now discuss how to efficiently implement Xheal . A key task in Xheal involves the distributed 
construction and maintenance (under insertion and deletion) of a regular expander. We use a 
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randomized construction of Law and Siu [20j that is described below. The expander graphs of |20j 
are formed by constructing a class of regular graphs called H- graphs. An H- graph is a 2<i-regular 
multigraph in which the set of edges is composed of d Hamilton cycles. A random graph from 
this class can be constructed (cf. Theorem below) by picking d Hamilton cycles independently and 
uniformly at random among all possible Hamilton cycles on the set of z > 3 vertices, and taking 
the union of these Hamilton cycles. This construction yields a random regular graph (henceforth 
called as a random H-graph) that that can be shown to be an expander with high probability (cf. 
Theorem |4]). The construction can be accomplished incrementally as follows. 

Let the neighbors of a node u be labeled as 
nbr(u)-i,nbr(u)i,nbr(u)-2, ...,nbr(u)-d, nbr(u)d- For each i, n6r(u)„j and nbr(u)i denote a node's 
predecessor and successor on the ith Hamilton cycle (which will be referred to as the level- i cycle). 
We start with 3 nodes, because there is only one possible H-graph of size 3. 

l.INSERT(u): A new node u will be inserted into cycle i between node Vi and node nbr(vi)i 
for randomly chosen v^, for i = 1, . . . , d. 

2. DELETE(u): An existing node u gets deleted by simply removing it and connecting nbr(u)i 
and nbr(u)-i, for i = 1, . . . , d. 

Law and Siu prove the following theorem (modified here for our purposes) that is used in Xheal 

Theorem 3 (|20|). Let Hq, Hi, H2, . . . be a sequence of H -graphs, each of size at least 3. Let Hq 
be a random H-graph of size n and let Hi + \ be formed from Hi by either INSERT or DELETE 
operation as above. Then Hi is a random H-graph for all i > 0. 

Theorem 4 ([9j[20]). ^ random n-node 2d-regular H-graph is an expander (with edge expansion 
Q(d) ) with probability at least 1 — 0(n~ p ) where p depends on d. 

Note that in the above theorem, the probability guarantee can be made as close to 1 as possible, 
by making d large enough. Also it is known that A, the second smallest eigenvalue, for these random 
graphs is close to the best possible [9|. Another point to note that although the above construction 
can yield a multigraph, it can be shown that similar high probabilistic guarantees hold in case we 
make the multi-edges simple, by making d large enough. Hence we will assume that the constructed 
expander graphs are simple. 

We next show how Xheal algorithm is implemented and analyze the time and message com- 
plexity per node deletion. We note that insertion of a node by adversary involves almost no work 
from Xheal . The adversary simply inserts a node and its incident edges (to existing nodes). Xheal 
simply colors these inserted edges as black. Hence we focus on the steps taken by Xheal under 
deletion of a node by the adversary. First we state the following lower bound on the amortized 
message complexity for deletions which is easy to see in our model (cf. Section [2]). Our algorithm's 
complexity will be within a logarithmic factor of this bound. 

Lemma 5. In the worst case, any healing algorithm needs @(deg(v)) messages to repair upon 
deletion of a node v, where deg(v) is the degree of v in G' t (i.e., the black-degree ofv). Furthermore, 
if we there are p deletions, v%, V2, ■ ■ • , v p , then the amortized cost is A(p) = (1/p) Y2i=i ®{deg{vi)) 
which is the best possible. 

Theorem 5. Xheal can be implemented to run in O(logn) rounds (per deletion). The amortized 
message complexity over p deletions is 0{n\ognA{p)) on average where n is the number of nodes 
in the network (at this timestep), k is the degree of the expander used in the construction, and A(p) 
is defined as in Lemma\^ 
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Proof. (Sketch) We first note that the healing operations will be initiated by the neighbors of the 
deleted node. We also note that primary and secondary expander clouds can be identified by the 
color of their edges (cf. Algorithm in Sectior[3l) 

Case 1: This involves constructing a (primary) expander cloud among the neighboring nodes 
N(v) of the deleted node v. Note that |iV(u)| = deg(v), where deg(v) is the black-degree of v. 
Since each node knows neighbor of neighbor's (NoN) addresses, it is akin to working on a complete 
graph over N(v). We first elect a leader among N(v): a random node (which is useful later) among 
N(v) is chosen as a leader. This can be done, for example, by using the Nearest Neighbor Tree 
(NNT) algorithm of [?]. This takes 0(log\N(v)\) time and 0(\N(v)\ log \N(v)\) messages. The 
leader then (locally) constructs a random K-regular if-graph over N(v) and informs each node in 
N(v) (directly, since its address is known) of their respective edges. The total messages needed to 
inform the nodes is 0(k\N(v)\), since that is the total number of edges. A neighbor of the leader in 
the expander graph is also elected as a vice-leader. This can be implemented in O(l) time. Hence, 
overall this case takes 0(log |iV(u)|) = 0(logdeg(v)) = O(logn) time and 0(ndeg(v) log deg(v)) 
messages. 

In particular, the following invariants will be maintained with respect to every expander (pri- 
mary or secondary) cloud: (a) Every node in the cloud will have a leader (randomly chosen among 
the nodes) associated with it ; (b) every node in the cloud knows the address of the leader and can 
communicate with it directly (in constant time); and (c) the leader knows the addresses of all other 
nodes in the cloud; (d) one neighbor of the leader in the cloud will be designated vice-leader which 
will know everything the leader knows and will take action in case the leader is deleted. Note that 
this invariant is maintained in Case 1. We will show that it is also maintained in Case 2 below. 

Case 2 (Cases 2.1 and 2.2 of Xheal): We have to implement three main operations in these 
cases. They are: 

(a) Reconstructing an expander cloud (primary or secondary) on deletion of a node v: Let C be 
the primary (or secondary) cloud that loses v. The node is removed according to the DELETE 
operation of i7-graph. This takes 0(1) time and O(k) messages. If v belongs to j primary clouds 
then the time is still 0(1) while the total message complexity is 0(jn). For v to belong to j primary 
clouds its black degree should be at least j. Also v can belong to at most one secondary cloud. 
Hence the cost is at most O(k) times the black degree as needed. If the deleted node happens to 
be the leader of the (primary) cloud then a new random leader is chosen (by the vice-leader) and 
inform the rest of the nodes — this will take 0(|0|) messages and 0(1) time, where \C\ is the 
number of nodes in the cloud. Since the adversary does not know the random choices made by the 
algorithm, the probability that it deletes a leader in a step is 1/|C| and thus the expected message 
complexity is 0((1/|C|)|C| = O(l). (Note that a new vice-leader, a neighbor of the new leader will 
be chosen if necessary.) 

(b) Forming and fixing primary and secondary expander clouds (if there are enough free nodes): 
Let the deleted node belong to primary clouds C±, . . . , Cj and possibly a secondary cloud F that 
connects a subset of these j clouds (and possibly other unaffected primary clouds). First, each 
of the clouds are reconstructed as in (a) above. This operation arises only if we have at least j 
free nodes, i.e., nodes that are not associated with any secondary cloud. We now mention how 
free nodes are found. To check if there are enough free nodes among the j clouds, we check the 
respective leaders. The leader always maintain a list of all free nodes in its cloud. Thus if a node 
becomes non-free during a repair it informs the leader (in constant time) which removes it from 
the list. Thus the neighbors of the deleted node can request the leaders of their respective clouds 
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to find free nodes. Hence finding free nodes takes time O(l) and needs O(j) messages. The free 
nodes are then inserted to form the secondary cloud. We distinguish two situations with respect 
to formation of a secondary cloud: (i) The secondary cloud is formed for the first time (i.e., a new 
secondary cloud among the primary clouds). In this case, a leader of one of the associated primary 
cloud is elected to construct the secondary expander. This leader then gets the free nodes from the 
respective primary clouds, locally constructs a K-regular expander and informs it to the respective 
free nodes of each primary cloud. This is similar to the construction of a primary cloud as in (a). 
The time and message complexity is also bounded as in (a). 

(ii)The secondary cloud is already present, merely, a new free node is added. In this case, the 
new node is inserted to the secondary cloud by using the INSERT operation of //-graph. This 
takes O(l) time and O(l) messages, since INSERT can be implemented by querying the leader. 

(c) Combining many primary expander clouds into one primary expander cloud (if there are 
not enough free nodes): This is a costly operation which we seek to amortize over many deletions. 
First, we compute the cost of combining clouds. Let C±,...,Cj are the clouds that need to be 
combined into one cloud C. This is done by first electing a leader over all the nodes in the clouds 
Ci, . . . , Cj. Note that the distance between any two nodes among these clouds is O(logn), since all 
the clouds had a common node (the deleted node) and each cloud is an expander (also note that 
the neighbors of the deleted nodes maintain connectivity during the leader election and subsequent 
repair process). A BFS tree is then constructed subsequently over the nodes of the j clouds with 
the leader as the root. The leader then collects all the addresses of all the nodes in the clouds (via 
the BFS tree) and locally constructs a //-graph and broadcasts it to all the other nodes in the 
cloud. The leader's address is also informed to all the other nodes in the cloud. Thus the invariant 
specified in Case 1 is maintained. The total time needed is O(logn) time and the total number of 
messages needed is 0(k^- =1 \Ci\) logn, since each node (other than the leader) sends O(l) number 
of messages over O(logn) hops, and the leader sends 0(5Zi=i | | ) log n . However, note that the 
costly operation of combining is triggered by having less than j free nodes. This implies that there 
must been at least fiQZf=i prior deletions that had enough free nodes and hence involved no 
combining. Thus, we can amortize the total cost of the combining cost over these "cheaper" prior 
deletions. Hence the amortized cost is 



Finally, we say how the probabilistic guarantee on the //-graph can be maintained. The imple- 
mentation above uses a K-regular random //-graph in the construction of an expander cloud. By 
theorem UJ k can be chosen large enough to guarantee the probabilistic requirement needed. For 
example, choosing k, = ©(logn), then high probability (with respect to the size of the network) is 
guaranteed (this assumes that nodes know an upper bound on the size of the network). Further- 
more, if there are / deletions, by union bound, the probability that it is not an expander increases 
by up to a factor of /. To address this, we reconstruct the //-graph after any cloud has lost half 
of its nodes; note that the cost of this reconstruction can be amortized over the deletions to obtain 
the same bounds as claimed. □ 
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6 Conclusion 



We have presented an efficient, distributed algorithm that withstands repeated adversarial node 
insertions and deletions by adding a small number of new edges after each deletion. It maintains 
key global invariants of the network while doing only localized changes and using only local in- 
formation. The global invariants it maintains are as follows. Firstly, assuming the initial network 
was connected, the network stays connected. Secondly, the (edge) expansion of the network is at 
least as good as the expansion would have been without any adversarial deletion, or is at least a 
constant. Thirdly, the distance between any pair of nodes never increases by more than a O(logre) 
multiplicative factor than what the distance would be without the adversarial deletions. Lastly, 
the above global invariants are achieved while not allowing the degree of any node to increase by 
more than a small multiplicative factor. 

The work can be improved in several ways in similar models. Can we improve the present 
algorithm to allow smaller messages and lower congestion? Can we efficiently find new routes 
to replace the routes damaged by the deletions? Can we design self-healing algorithms that are 
also load balanced? Can we reach a theoretical characterization of what network properties are 
amenable to self-healing, especially, global properties which can be maintained by local changes? 
What about combinations of desired network invariants? We can also extend the work to different 
models and domains. We can look at designing algorithms for less flexible networks such as sensor 
networks, explore healing with non-local edges. We can also look beyond graphs to rewiring and 
self-healing circuits where it is gates that fail. 
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